Dataset Editing Techniques: A Comparative Study

نویسندگان

  • Nidal Zeidat
  • Sujing Wang
  • Christoph F. Eick
چکیده

Editing techniques remove examples from datasets with the goal to obtain more accurate and faster classifiers. The objective of this paper is to compare several popular dataset editing techniques with respect to classification accuracy and training set compression rate including Wilson editing, Citation editing, and Multi-edit. Moreover, supervised clustering editing is introduced which replaces examples belonging to a cluster by a cluster representative. Furthermore, we explore the benefits of replacing datasets by support vectors that are commonly used in Support Vector Machine (SVM). We also discuss the results of experiments that compare and analyze the relationships between the editing techniques investigated by using a benchmark consisting of UCI and artificial 2D spatial datasets. Our empirical evaluation shows that editing techniques, in general, improve the classification accuracy of a 1-NN classifier significantly, leading to more efficient and accurate classifiers for most of the datasets tested. The experimental results show a strong performance for Wilson, Citation, and supervised clustering editing and poor performance by Multi-edit and SVM editing with respect to classification accuracy. Furthermore, training set compression rates reported by supervised clustering editing were superior to all other editing techniques investigated.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collaborative Output Tasks and their Effects on Learning English Comparative Adjectives

This study aimed to examine the effect of two types of collaborative output tasks on Iranian EFL learners’ comparative adjectives with two or more syllables. Thirty Iranian EFL learners participated in this study which were then divided into two experimental and one control groups; one experimental group received dictogloss task in 4-pairs and the other experimental group was given text reconst...

متن کامل

The Comparative Effect of Task Type and Learning Conditions on the Achievement of Specific Target Forms

The completion mode (individual, collaborative) of the tasks and the conditions under which these modes are performed have been reported to play an important role in language learning. The present study aimed to investigate the effects of employing text editing tasks performed both individually and collaboratively, on the achievement of English grammar under explicit and implicit learning condi...

متن کامل

The new genomic editing system (CRISPR)

Over the past decades, progression in genetic element manipulation, and consequently, the treatment of diseases has been remarkable. It is worth noting that these genetic manipulations perform at different levels, including DNA and RNA. The earlier genomic editing techniques, including MN, ZFN , TALEN , performing their functions by creating double-stranded breaks (DSBs), and after breakage, th...

متن کامل

فناوری ویرایش ژن کریسپر ـ کَس 9 از منظر حقوق مالکیت فکری و ایمنی زیستی

In recent years, inexpensive and fruitful gene editing techniques such as CRISPR-Cas9 and NaAgo have revolutionized the biotechnology industry. Genetically edited organisms, gene therapy, treatment of diseases such as AIDS and editing human cells are some of the marvelous applications of such technologies. Using such technologies in large scale or granting exclusive rights on their products or ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005